Search CORE

21 research outputs found

Swisslink: high-precision, context-free entity linking exploiting unambiguous labels

Author: Cudré-Mauroux Philippe
Difallah Djellel Eddine
Luggen Michael
Prokofyev Roman
Publication venue
Publication date: 04/04/2019
Field of study

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators

RERO DOC Digital Library

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Author: Beckers Debby G.J.
Brelig Jonathan Warren
Chandler Jesse
Difallah Djellel Eddine
Kapelner Adam
Karger David R.
Krishna Ranjay
Liu Angli
Sorokin Alexander
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2016
Field of study

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.Comment: 10 pages, 11 figures, accepted CSCW 201

arXiv.org e-Print Archive

Crossref

Hippocampus: answering memory queries using transactive search

Author: Alberto Tonon
Djellel Eddine Difallah
Gianluca Demartini
Karl Aberer
Michele Catasta
Philippe Cudre-Mauroux
† Epfl-Switzerland
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Memory queries denote queries where the user is trying to recall from his/her past personal experiences. Neither Web search nor structured queries can effectively answer this type of queries, even when supported by Human Computation so- lutions. In this paper, we propose a new approach to answer memory queries that we call Transactive Search: The user- requested memory is reconstructed from a group of people by exchanging pieces of personal memories in order to reassem- ble the overall memory, which is stored in a distributed fash- ion among members of the group. We experimentally com- pare our proposed approach against a set of advanced search techniques including the use of Machine Learning methods over the Web of Data, online Social Networks, and Human Computation techniques. Experimental results show that Transactive Search significantly outperforms the effective- ness of existing search approaches for memory queries

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

University of Queensland eSpace

Recommended from our members

Crowdsourcing in China: Exploring the Work Experience of Solo Crowdworkers and Crowdfarm Workers

Recent research highlights the potential of crowdsourcing in China. Yet very few studies explore the workplace context and experiences of Chinese crowdworkers. Those that do, focus mainly on the work experiences of solo crowdworkers but do not deal with issues pertaining to the substantial amount of people working in ‘crowdfarms’. This article addresses this gap as one of its primary concerns. Drawing on a study that involves 48 participants, our research explores, compares and contrasts the work experiences of solo crowdworkers to those of crowdfarm workers. Our findings illustrate that the work experiences and context of the solo workers and crowdfarm workers are substantially different, with regards to their motivations, the ways they engage with crowdsourcing, the tasks they work on, and the crowdsourcing platforms they utilize. Overall, our study contributes to furthering the understandings on the work experiences of crowdworkers in China

City Research Online

Crossref

RIT Scholar Works

SectionLinks: Mapping Orphan Wikidata Entities onto Wikipedia Sections

Author: Cudré-Mauroux Philippe
Difallah Djellel Eddine
Ostapuk Natalia
Publication venue
Publication date: 31/05/2021
Field of study

Wikidata is a key resource for the provisioning of structured data on several Wikimedia projects, including Wikipedia. By design, all Wikipedia articles are linked to Wikidata entities; such mappings represent a substantial source of both semantic and structural information. However, only a small subgraph of Wikidata is mapped in that way – – only about 10% of the sitelinks are linked to English Wikipedia, for example. In this paper, we describe a resource we have built and published to extend this subgraph and add more links between Wikidata and Wikipedia. We start from the assumption that a number of Wikidata entities can be mapped onto Wikipedia sections, in addition to Wikipedia articles. The resource we put forward contains tens of thousands of such mappings, hence considerably enriching the highly structured Wikidata graph with encyclopedic knowledge from Wikipedia

RERO DOC Digital Library

Pick-a-crowd: tell me what you like, and I'll tell you what to do: a crowdsourcing platform for personalized human intelligence task assignment based on social networks

Author: Cudré-Mauroux Philippe
Demartini Gianluca
Difallah Djellel Eddine
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies

University of Queensland eSpace